Qwen3 Coder 30B A3B Instruct

About the Provider

Qwen is an AI model family developed by Alibaba Group, a major Chinese technology and cloud computing company. Through its Qwen initiative, Alibaba builds and open-sources advanced language , images and coding models under permissive licenses to support innovation, developer tooling, and scalable AI integration across applications

Model Quickstart

This section helps you quickly get started with the Qwen/Qwen3-Coder-30B-A3B-Instruct model on the Qubrid AI inferencing platform. To use this model, you need:

A valid Qubrid API key
Access to the Qubrid inference API
Basic knowledge of making API requests in your preferred language

Once authenticated with your API key, you can send inference requests to the Qwen/Qwen3-Coder-30B-A3B-Instruct model and receive responses based on your input prompts. Below are example placeholders showing how the model can be accessed using different programming environments.
You can choose the one that best fits your workflow.

from openai import OpenAI

# Initialize the OpenAI client with Qubrid base URL
client = OpenAI(
  base_url="https://platform.qubrid.com/v1",
  api_key="QUBRID_API_KEY",
)

# Create a streaming chat completion
stream = client.chat.completions.create(
  model="Qwen/Qwen3-Coder-30B-A3B-Instruct",
  messages=[
    {
      "role": "user",
      "content": "Write a Python function to calculate fibonacci sequence"
    }
  ],
  max_tokens=65536,
  temperature=0.7,
  top_p=0.8,
  stream=True
)

# If stream = False comment this out
for chunk in stream:
  if chunk.choices and chunk.choices[0].delta.content:
      print(chunk.choices[0].delta.content, end="", flush=True)
print("\n")

# If stream = True comment this out
print(stream.choices[0].message.content)

This will produce a response similar to the one below:

Here are several Python implementations of the Fibonacci sequence calculation:

## 1. Basic Recursive Approach (Simple but Inefficient)

def fibonacci_recursive(n):
    if n <= 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci_recursive(n-1) + fibonacci_recursive(n-2)

## 2. Iterative Approach (Efficient)

def fibonacci_iterative(n):
    if n <= 0:
        return 0
    elif n == 1:
        return 1
    a, b = 0, 1
    for _ in range(2, n + 1):
        a, b = b, a + b
    return b

## Recommended Approach:
For most practical purposes, use the iterative approach because it's:
- Fast (O(n) time complexity)
- Memory efficient (O(1) space complexity)
- Easy to understand and implement

Model Overview

Qwen3 Coder 30B A3B is a large causal language model designed for code generation and technical reasoning. It belongs to the latest generation of the Qwen model family and supports both thinking mode for complex reasoning and non-thinking mode for efficient general usage within the same model. The model is built using a Mixture-of-Experts (MoE) architecture, activating only a subset of parameters per request to balance performance and efficiency. It is trained through both pretraining and post-training stages and supports long context lengths for complex coding and reasoning workflows.

Model at a Glance

Feature	Details
Model ID	Qwen/Qwen3-Coder-30B-A3B-Instruct
Provider	Qwen
Model Type	Causal Language Model
Architecture	Mixture-of-Experts (MoE) Transformer, 48 layers, GQA attention, 128 experts (8 active per forward pass)
Model Size	1.1B Params
Parameters	4B

When to use?

You should consider using Qwen3 Coder 30B A3B if:

Your application focuses on code generation or technical reasoning
You need long context support for large codebases or complex prompts
You want a model that can switch between deep reasoning and efficient responses
Your workflow includes agent-based tasks with external tools
You require multilingual support for technical or coding tasks

Inference Parameters

Parameter Name	Type	Default	Description
Streaming	boolean	true	Enable streaming responses for real-time output.
Temperature	number	0.7	Controls randomness; higher values produce more diverse, less deterministic output.
Max Tokens	number	65536	Maximum tokens to generate in the response, suitable for long-form code or large refactors.
Top P	number	0.8	Nucleus sampling controlling token sampling diversity.

Key Features

Supports thinking mode for complex reasoning, mathematics, and coding
Supports non-thinking mode for efficient general-purpose dialogue
Strong performance in code generation, technical reasoning, and logical tasks
Designed for agent workflows with tool integration
Supports multilingual instruction following and translation

Best Practices

Sampling Settings

Thinking Mode

(enable_thinking = true) :

Temperature: 0.6
Top-P: 0.95
Top-K: 20
Min-P: 0

Avoid greedy decoding to prevent repetition and degraded performance.

Non-Thinking Mode

(enable_thinking = false) :

Temperature: 0.7
Top-P: 0.8
Top-K: 20
Min-P: 0

Output Length

Recommended output length: 32,768 tokens
For highly complex math or programming problems: 38,912 tokens

Prompt Standardization

Math Problems

Include the following instruction:

Please reason step by step, and put your final answer within \boxed{}.

Multi-Turn Conversations

Historical responses should include only the final output
Thinking content should not be stored in conversation history
This behavior is handled automatically in the provided Jinja2 chat template

Summary

Qwen3 Coder 30B A3B is a Mixture-of-Experts language model. It is optimized for code generation and technical reasoning tasks. The model supports both thinking and non-thinking modes in a single deployment. It provides long context support up to 131K tokens with YaRN. Designed for efficient, multilingual, and agent-based inferencing workflows.

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Best Practices

Sampling Settings

Thinking Mode

Non-Thinking Mode

Output Length

Prompt Standardization

Math Problems

Summary

Getting started

GPU Compute

Inferencing

Qubrid AI Models

AI Tools

​About the Provider

​Model Quickstart

​Model Overview

​Model at a Glance

​When to use?

​Inference Parameters

​Key Features

​Best Practices

​Sampling Settings

​Thinking Mode

​Non-Thinking Mode

​Output Length

​Prompt Standardization

​Math Problems

​Summary

About the Provider

Model Quickstart

Model Overview

Model at a Glance

When to use?

Inference Parameters

Key Features

Best Practices

Sampling Settings

Thinking Mode

Non-Thinking Mode

Output Length

Prompt Standardization

Math Problems

Summary